Features for Named Entity Recognition in Czech Language
نویسنده
چکیده
This paper deals with Named Entity Recognition (NER). Our work focuses on the application for the Czech News Agency (ČTK). We propose and implement a Czech NER system that facilitates the data searching from the ČTK text news databases. The choice of the feature set is crucial for the NER task. The main contribution of this work is thus to propose and evaluate some different features for the named entity recognition and to create an “optimal” set of features. We use Conditional Random Fields (CRFs) as a classifier. Our system is tested on a Czech NER corpus with nine main named entity classes. We reached 58% of the F-measure with the best feature set which is sufficient for our target application.
منابع مشابه
A Novel Approach to Conditional Random Field-based Named Entity Recognition using Persian Specific Features
Named Entity Recognition is an information extraction technique that identifies name entities in a text. Three popular methods have been conventionally used namely: rule-based, machine-learning-based and hybrid of them to extract named entities from a text. Machine-learning-based methods have good performance in the Persian language if they are trained with good features. To get good performanc...
متن کاملبهبود شناسایی موجودیتهای نامدار فارسی با استفاده از کسره اضافه
Named entity recognition is a process in which the people’s names, name of places (cities, countries, seas, etc.) and organizations (public and private companies, international institutions, etc.), date, currency and percentages in a text are identified. Named entity recognition plays an important role in many NLP tasks such as semantic role labeling, question answering, summarization, machine ...
متن کاملسیستم شناسایی و طبقهبندی موجودیتهای اسمی در متون زبان فارسی بر پایه شبکه عصبی
Named Entity Recognition (NER) is a fundamental task in natural language processing and also known as a subset of information extraction. We seek to locate and classify named entities in text into predefined categories such as the names of persons, organizations, locations, expressions of times, etc. Named Entity Recognition for English texts has been researched widely for the past years, howev...
متن کاملNeural Network Based Named Entity Recognition
Czech named entity recognition (the task of automatic identification and classification of proper names in text, such as names of people, locations and organizations) has become a well-established field since the publication of the Czech Named Entity Corpus (CNEC). This doctoral thesis presents the author’s research of named entity recognition, mainly in the Czech language. It presents work and...
متن کاملMaximum Entropy Named Entity Recognition for Czech Language
Named Entity Recognition (NER) is an important preprocessing tool for many Natural Language Processing tasks like Information Retrieval, Question Answering or Machine Translation. This paper is focused on NER for Czech language. The proposed NER is based on knowledge and experiences acquired on other languages and adapted for Czech. Our recognizer outperforms the previously introduced recognize...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011